Conversation
Signed-off-by: Lawrence Lane <llane@nvidia.com>
6fa4689 to
71f0756
Compare
| |-----------|--------|-------------| | ||
| | [NeMo RL](https://github.com/NVIDIA-NeMo/RL) | ✅ Supported | NVIDIA's scalable post-training library with GRPO, DPO, SFT | | ||
| | [Unsloth](https://github.com/unslothai/unsloth) | ✅ Supported | Fast fine-tuning framework with memory optimization | | ||
| | [veRL](https://github.com/volcengine/verl) | 🔜 In Progress | Volcano Engine's scalable RL framework | |
There was a problem hiding this comment.
i dont think verl is in progres but maybe someone is working on it?
and i think we can change TRL to say supported now, we are just fixing a minor last minute change, and working on additional docs e.g. sample reward/step or a potential blog post.
| | Library | Status | Description | | ||
| |---------|--------|-------------| | ||
| | [reasoning-gym](https://github.com/open-thought/reasoning-gym) | ✅ Supported | Procedurally generated reasoning tasks (see `reasoning_gym` resource server) | | ||
| | [Aviary](https://github.com/Future-House/aviary) | ✅ Supported | Multi-environment framework for tool-using agents (see `aviary` resource server) | |
There was a problem hiding this comment.
may be worth saying its openai gymnasium compatible (but we should double confirm that statement)
Prime intellect - the library is named verifiers, or environments hub, not prime intelelct itself, imo
browsergym - not sure if anyone is working on this? @cwing-nvidia ?
| | Name | Demonstrates | Config | README | | ||
| | ------------------ | ------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------- | | ||
| | Multi Step | Multi-step tool calling | <a href='resources_servers/example_multi_step/configs/example_multi_step.yaml'>example_multi_step.yaml</a> | <a href='resources_servers/example_multi_step/README.md'>README</a> | | ||
| | Reasoning Gym | External environment library integration | <a href='resources_servers/reasoning_gym/configs/reasoning_gym.yaml'>reasoning_gym.yaml</a> | <a href='resources_servers/reasoning_gym/README.md'>README</a> | |
There was a problem hiding this comment.
i thought these dont go in readme because they dont have hf dataset link, i thought this readme table was built automatically based on that somehow
| | Resource Server | Domain | Dataset | Description | Value | Config | Train | Validation | License | | ||
| | -------------------------- | --------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------- | ----- | ---------- | --------------------------------------------------------- | | ||
| | Aviary (GSM8K) | agent | <a href='https://arxiv.org/abs/2110.14168'>GSM8K</a> | Grade school math with calculator tool via Aviary integration | Improve math reasoning with tool use | <a href='resources_servers/aviary/configs/gsm8k_aviary.yaml'>config</a> | ✓ | - | MIT | | ||
| | Aviary (HotPotQA) | agent | <a href='https://aclanthology.org/D18-1259/'>HotPotQA</a> | Multi-hop question answering via Aviary integration | Improve multi-hop reasoning capabilities | <a href='resources_servers/aviary/configs/hotpotqa_aviary.yaml'>config</a> | ✓ | - | Apache 2.0 | |
There was a problem hiding this comment.
are we starting to enumerate multiple datasets / env implementation in the readme now too? we should do same for math for example too then? @bxyu-nvidia
| } | ||
| ``` | ||
|
|
||
| Any framework that can read this format can use NeMo Gym rollouts—no native integration required. The following frameworks have documented patterns. |
There was a problem hiding this comment.
i think its more complex than this. dont we already have a training fw integration guide with varios requirements? e.g. async openai compatible, retokenization correction, etc
No description provided.